Perf #856: inline volatile cancellation check at loop backedges#874
Merged
Conversation
The per-iteration `call $Runtime.CheckCancellation()` emitted at every loop
backedge sat in every loop body as an optimization barrier: RyuJIT won't
inline CheckCancellation (it contains newobj+throw), so the bare call was
measured at ~half the runtime of a tight numeric loop.
Inline the field test on the hot path and only call the throwing helper on
the cold cancel path:
volatile. ldsfld _cancelRequested
brfalse <loop body>
call CheckCancellation() // cold: only when cancelling
The volatile. prefix is mandatory: _cancelRequested is loop-invariant, so a
plain ldsfld could be hoisted out of the loop by LICM, reading the flag once
and never re-checking — silently reintroducing the #74 async-hang. Volatile
forbids the hoist at zero measured cost.
Results (branch vs main vs Node):
factorial tight loop 1027 -> 665 ms (1.6x; Node 210)
count-primes @100k 403 -> 359 ms (1.12x; Node 264)
Correctness: compiled while(true) unwinds with OperationCanceledException
the instant _cancelRequested is tripped via reflection (the #74 harness
path); vm-timeout tests pass; Test262 Timeout=0. Loop-backedge emitter only;
the dynamic-invocation recursion guard (EmitStackGuard) is left as-is.
nickna
added a commit
that referenced
this pull request
Jun 21, 2026
…llation-check inlining Compiled output now meets/beats Node on 5/7 benchmark workloads; the two stragglers (count-primes ~1.3x, factorial ~3x) are bounded by separate non-codegen factors. Records the #874 inline-volatile loop-cancellation win (1.6x tight loops / 1.12x sieve) and the rejected throttle variant.
This was referenced Jun 21, 2026
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes part of #856 (the tracked "next lever" — the per-iteration
$Runtime::CheckCancellation()call).Every loop backedge emitted an unconditional
call $Runtime.CheckCancellation(). RyuJIT won't inlineCheckCancellation(it containsnewobj+throw), so that bare call sat in every loop body as a per-iteration optimization barrier — measured at ~half the runtime of a tight numeric loop.This inlines the field test on the hot path and only calls the throwing helper on the cold cancel path:
The
volatile.prefix is mandatory for correctness:_cancelRequestedis loop-invariant, so a plainldsfldcould be hoisted out of the loop by RyuJIT's LICM — reading the flag once and never re-checking, silently reintroducing the #74 async-hang. The volatile read forbids the hoist and measured at zero cost. Only the loop-backedge emitter changed; the cold throw stays in the helper, and the dynamic-invocation recursion guard (EmitStackGuard) is left as a plain call (not per-iteration-hot; directly-compiled calls bypass it).Results (branch vs main vs Node)
The win scales with the loop's arithmetic fraction (biggest on pure-numeric loops, smaller where array ops dominate). No regression on other workloads.
Correctness — verified directly
while(true)loop ran 1s, then unwound withOperationCanceledExceptionthe instant_cancelRequestedwas tripped via reflection (the exact test-harness path).vm-timeout tests pass; Test262Timeout=0.dotnet test: 13994 pass. The Test262 baseline-drift failures (2 interpreted / 4 compiled, all inArray.isArray/Math/Proxyfamilies) reproduce identically onmain— pre-existing stale baseline, not from this change (verified by stash + rebuild + re-run). The 6 compiled-mode unit failures pass in isolation (parallel DLL-build contention flakiness).SharpTS.TypeScriptConformance: green (codegen change can't affect the type-checker).Follow-ups (not in this PR)
vars.